{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "978146e2",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/huggingface.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f717d3d4-942b-4d86-9435-fc44b3ac6d39",
   "metadata": {},
   "source": [
    "# Hugging Face LLMs\n",
    "\n",
    "There are many ways to interface with LLMs from [Hugging Face](https://huggingface.co/).\n",
    "Hugging Face itself provides several Python packages to enable access,\n",
    "which LlamaIndex wraps into `LLM` entities:\n",
    "\n",
    "- The [`transformers`](https://github.com/huggingface/transformers) package:\n",
    "  use `llama_index.llms.HuggingFaceLLM`\n",
    "- The [Hugging Face Inference API](https://huggingface.co/inference-api),\n",
    "  [wrapped by `huggingface_hub[inference]`](https://github.com/huggingface/huggingface_hub):\n",
    "  use `llama_index.llms.HuggingFaceInferenceAPI`\n",
    "\n",
    "There are _many_ possible permutations of these two, so this notebook only details a few.\n",
    "Let's use Hugging Face's [Text Generation task](https://huggingface.co/tasks/text-generation) as our example."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90cf0f2e-8d8d-4e42-81bf-866c759221e1",
   "metadata": {},
   "source": [
    "In the below line, we install the packages necessary for this demo:\n",
    "\n",
    "- `transformers[torch]` is needed for `HuggingFaceLLM`\n",
    "- `huggingface_hub[inference]` is needed for `HuggingFaceInferenceAPI`\n",
    "- The quotes are needed for Z shell (`zsh`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f413f179",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-llms-huggingface"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b04b4a5-6fce-4188-a538-9a5ce2fa56f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install \"transformers[torch]\" \"huggingface_hub[inference]\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dac8f9f-7136-43f7-9e9f-de679e74d66e",
   "metadata": {},
   "source": [
    "Now that we're set up, let's play around:"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2c577674",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86028752",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0465029c-fe69-454a-9561-55f7a382b2e2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from typing import List, Optional\n",
    "\n",
    "from llama_index.llms.huggingface import (\n",
    "    HuggingFaceInferenceAPI,\n",
    "    HuggingFaceLLM,\n",
    ")\n",
    "\n",
    "# SEE: https://huggingface.co/docs/hub/security-tokens\n",
    "# We just need a token with read permissions for this demo\n",
    "HF_TOKEN: Optional[str] = os.getenv(\"HUGGING_FACE_TOKEN\")\n",
    "# NOTE: None default will fall back on Hugging Face's token storage\n",
    "# when this token gets used within HuggingFaceInferenceAPI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a27feba3-d027-4d10-b1af-1e130e764a67",
   "metadata": {},
   "outputs": [],
   "source": [
    "# This uses https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha\n",
    "# downloaded (if first invocation) to the local Hugging Face model cache,\n",
    "# and actually runs the model on your local machine's hardware\n",
    "locally_run = HuggingFaceLLM(model_name=\"HuggingFaceH4/zephyr-7b-alpha\")\n",
    "\n",
    "# This will use the same model, but run remotely on Hugging Face's servers,\n",
    "# accessed via the Hugging Face Inference API\n",
    "# Note that using your token will not charge you money,\n",
    "# the Inference API is free it just has rate limits\n",
    "remotely_run = HuggingFaceInferenceAPI(\n",
    "    model_name=\"HuggingFaceH4/zephyr-7b-alpha\", token=HF_TOKEN\n",
    ")\n",
    "\n",
    "# Or you can skip providing a token, using Hugging Face Inference API anonymously\n",
    "remotely_run_anon = HuggingFaceInferenceAPI(\n",
    "    model_name=\"HuggingFaceH4/zephyr-7b-alpha\"\n",
    ")\n",
    "\n",
    "# If you don't provide a model_name to the HuggingFaceInferenceAPI,\n",
    "# Hugging Face's recommended model gets used (thanks to huggingface_hub)\n",
    "remotely_run_recommended = HuggingFaceInferenceAPI(token=HF_TOKEN)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b801bef7-2593-49e2-a550-721e6b796486",
   "metadata": {},
   "source": [
    "Underlying a completion with `HuggingFaceInferenceAPI` is Hugging Face's\n",
    "[Text Generation task](https://huggingface.co/tasks/text-generation)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "631269c9-38ca-49d2-a7f0-f88e21adef6e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " beyond!\n",
      "The Infinity Wall Clock is a unique and stylish way to keep track of time. The clock is made of a durable, high-quality plastic and features a bright LED display. The Infinity Wall Clock is powered by batteries and can be mounted on any wall. It is a great addition to any home or office.\n"
     ]
    }
   ],
   "source": [
    "completion_response = remotely_run_recommended.complete(\"To infinity, and\")\n",
    "print(completion_response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dda1be10",
   "metadata": {},
   "source": [
    "If you are modifying the LLM, you should also change the global tokenizer to match!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12e0f3c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import set_global_tokenizer\n",
    "from transformers import AutoTokenizer\n",
    "\n",
    "set_global_tokenizer(\n",
    "    AutoTokenizer.from_pretrained(\"HuggingFaceH4/zephyr-7b-alpha\").encode\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3fa723d6-4308-4d94-9609-8c51ce8184c3",
   "metadata": {},
   "source": [
    "If you're curious, other Hugging Face Inference API tasks wrapped are:\n",
    "\n",
    "- `llama_index.llms.HuggingFaceInferenceAPI.chat`: [Conversational task](https://huggingface.co/tasks/conversational)\n",
    "- `llama_index.embeddings.HuggingFaceInferenceAPIEmbedding`: [Feature Extraction task](https://huggingface.co/tasks/feature-extraction)\n",
    "\n",
    "And yes, Hugging Face embedding models are supported with:\n",
    "\n",
    "- `transformers[torch]`: wrapped by `HuggingFaceEmbedding`\n",
    "- `huggingface_hub[inference]`: wrapped by `HuggingFaceInferenceAPIEmbedding`\n",
    "\n",
    "Both of the above two subclass `llama_index.embeddings.base.BaseEmbedding`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
